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^ \ I consider the tradeoff between the information gained about an initially 

t^j- ' unknown quantum state, and the disturbance caused to that state by the 

measurement process. I show that for any distribution of initial states, the 
. information-disturbance frontier is convex, and disturbance is nondecreasing 

with information gain. I consider the most general model of quantum measure- 
| ments, and all post-measurement dynamics compatible with a given measure- 

t— I ' ment. For the uniform initial distribution over states, I show that an optimal 

information-disturbance combination may always be achieved by a measure- 
rs) ■ ment procedure which satisfies a generalization of the projection postulate, the 
"square-root dynamics." I use this to show that the information-disturbance 
frontier for the uniform ensemble may be achieved with "isotropic" (unitarily 
covariant) dynamics. This results in a significant simplification of the opti- 
£^ ' mization problem for calculating the tradeoff in this case, giving hope for a 
closed- form solution. I also show that the discrete ensembles uniform on the 
d(d + 1) vectors of a certain set of d + 1 "mutually unbiased" or conjugate 
j> \ bases in d dimensions form spherical 2-designs in CP^-i when d is a power of 
\ an odd prime. This implies that many of the results of the paper apply also 



a: 



«h ' to these discrete ensembles. 



I. INTRODUCTION 

In this paper, I consider one of the salient ways in which quantum information differs 
from classical information In classical information theory, we may in principle determine the 
state of a system arbitrarily accurately with arbitrarily little disturbance to that state. By 
contrast, in quantum mechanics any measurement which allows one to obtain information 
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about the state of a quantum system must, on average, disturb that state, except in special 
cases. The special cases are when the possible states of the system are known in advance to lie 
in one or the other of two or more orthogonal subspaces — then the information about which 
of the orthogonal subspaces the state lies in can be extracted without disturbance. This fact 
underlies some important applications of quantum mechanics in information processing, 
notably quantum key distribution |3J || and other forms of quantum cryptography, as well 
as some applications to algorithms, such as the proof that PSPACE has constant-round 
quantum interactive proof systems 0. The goal of this paper is to quantify the tradeoff 
between information gained and disturbance to the system, and derive general features of 
that tradeoff. 

In introductory presentations of quantum theory, it is often stated that when a quantum 
system is measured and a result uniquely associated with a particular eigenvector of the 
measured observable is obtained, the system state "collapses" to that eigenvector. This is 
usually known as "the projection postulate," and attributed to von Neumann [|J. It clearly 
represents a disturbance to the system's state, unless the system is already in an eigenstate 
of the measured observable. A generalization of the projection postulate to observables 
with degenerate eigenspaces is known as "Liiders' rule;" it is slightly different from von 
Neumann's proposed post-measurement dynamics for that situation. Liiders' rule says that 
upon a measurement yielding result b corresponding to a projector II & (onto a degenerate 
eigenspace of the observable) an initial density operator p evolves to 

where pt = TrpILj is the probability of obtaining result 6 @, 0. Hence the after- 
measurement unconditional density operator becomes p' := X^n^pILj. But in fact this 
postulate describes only one of the many possible ways in which a physical process of mea- 
surement may affect a system. I will call measurements in which the effect on the system 
is described by the Liiders' rule form of the projection postulate projective measurements. 
Von Neumann's proposal, that the post-measurement density matrix conditional on observ- 
ing the 6-th outcome becomes ILj/tr lib, is another potential post-measurement dynamics 
which is consistent with quantum theory. (Liiders' rule, however, is a more appropriate 
candidate for a "generalized projection postulate," since it describes the conditional dy- 
namics of measurement via a projection.) In the next section, I will review a more general 
description both of measurements (as Positive Operator Valued Measures (POVM's)) and 
of their effects on the system (as a collection of trace-decreasing completely positive maps, 
or quantum operations summing to a trace-preserving map). In this paper, I will generalize 
the projection postulate to POVM's. There are many collections of operations which are 
consistent with a given measurement. I show that this generalized "projection" postulate 
selects the set of quantum operations which is, on average, least disturbing to an initially 
completely unknown input state. I then investigate the tradeoff between information gained 
in a measurement, and expected disturbance of a completely unknown initial state. This 
tradeoff is a quantitative expression of one of the most salient and distinctive features of 
quantum mechanics: that measurement disturbs a quantum mechanical system. 
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II. QUANTUM MEASUREMENTS AND OPERATIONS 



A very general characterization of physically realizable measurement processes is: allow 
the system to be measured to interact unitarily with another system, often termed the 
"ancilla", which starts out in some standard state. Then measure some set of orthogonal 
projectors on the ancilla. The outcomes of this measurement may provide information 
about the system, and therefore may be considered to be the results of a measurement on 
the system. There is no need to consider the effect of this measurement on the ancilla if 
one is only interested in the system, for whether the projection postulate, or some other 
rule, describes what happens to the ancilla, is not relevant to what happens to the system. 
The probabilities of the various results of this measurement, and the associated change in 
the system density operator, may be described solely in terms of the system itself, via the 
formalism of Positive Operator Valued Measures (POVM's) and associated "operations" . 

A discrete POVM is set of positive operators F b indexed by positive integers, say, such 
that 

b 

and the probability of obtaining the measurement result with index b is TipF b . For a standard 
measurement of a Hermitian observable on the system, the Fb are just the projectors onto the 
eigenspaces of the observable. Such a measurement of projectors is often called "projection- 
valued" (not to be confused with a "projective" measurement as defined above). I will often 
call the elements Ft, of a POVM "effects," following Ludwig |7] and Kraus ||. We will also 
have some occasion to use continuously indexed POVM's, corresponding to a continuum of 
possible measurement results. These may be loosely thought of as a continuously indexed 
set of "infinitesimal" positive operators dp(a)F a , such that / dp a F a = 1. The probability 
that a lies in a Borel set A is then given by tr p J A dp(a)F^a}. 

I believe that confining our attention to discrete, indeed finitely indexed, POVMs and 
instruments results in no loss of generality. Arguments similar to, but more involved than, 
those of Davies || and of Ozawa [10| (who treat the maximal information without a distur- 



bance constraint) should show that since the optimal information for a given disturbance can 
always be achieved with a POVM having a finite number of outcomes (bounded in advance 
by a polynomial in the dimension of Hilbert space) even when we initially vary over more 
general sets of physically reasonable POVMs. Since this promises to be rather technical, it 



will be worked out elsewhere. Nevertheless, in Section |VII| it will be useful to use a continu- 
ously indexed POVM rather than discrete ones achieving the same information-disturbance 
combination, because of the continuous POVM's greater symmetry. 

The general form for the post-measurement quantum state (density operator) conditional 



on obtaining the result b for a measurement of a POVM consisting of operators Fb is JTl] 



p'b = A b ( P )=J2 A biP A l (2) 

i 

where the A bi satisfy 

J2 AlAbi = F b . (3) 
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The linear map Ab, often referred to as an operation, will be said to have a Hellwig-Kraus 
(HK) decomposition, or simply a decomposition, {.Aw}; I will often write this Ab ~ {A bi }i. 
Note that here and below I use a convention for ensembles or sets denoted by expressions 
within curly brackets. The convention is that when we put part of the expression within 
the brackets as a subscript of the right-hand bracket, the overall expression refers to the 
ensemble given by the expression within brackets, when only the subscripted piece varies. 
Thus for example {pij} refers to the ensemble of the p^ for various j and fixed i. This is, 
therefore, the i-th in a list of ensembles indexed by i. (Somewhat irregularly, when there 
would only be one subscript and it already appears as the sole subscript of the expression 
within brackets, I will omit it outside the brackets; thus {Ft,} means {F b } b .) I will sometimes 
refer to the operators of a decomposition as HK operators. 

Define A :=J2b Ab, (so that A(p) = J2bi AbipA bi . This is the overall operation if one does 
not know the measurement result; it A is trace-preserving. Notice that p' b is unnormalized, 
and its trace gives the probability of the measurement outcome. As usual, I denote a 
normalized version of an operator with a hat: 

P'b ■= PhAr p'b ■ (4) 

I will say that an operation A is compatible with a POVM {Fb} if there exists an HK 
decomposition {Am} of A such that (||) holds. The collection of operations Ab defined by 



Ab ~ {^4bi}i is often referred to as an instrument for the POVM [13]. When an operation A 
is viewed as an instrument for a compatible POVM S = {Fb}, I will sometimes call this the 
procedure (E, .4); this is equivalent to the instrument {Ab}. If we use the polar decomposition 
Abi = UfoPbi, (P positive, U unitary), then we have that F b = J2iPbi- ^ Pu does not vary 

1/2 

with i, then all the Pu are proportional to F b , and with b known the value of i contains 
no additional information about the initial state. If Pm does vary with i, then the value of 
i represents further information that is not gathered by the POVM {Fb}, but which could 
have been gathered via a POVM {P 6 2 J consistent with the same operation. In fact, one can 
construct a physical realization of this operation (unitary evolution on system plus ancilla 
followed by projective measurement on the ancilla) such that measuring Fb instead of P bi 
just corresponds to coarse-graining the projective measurement on the ancilla by grouping 
projectors together to form higher- dimensional ones. One might expect that the potential 
for gathering more information will remove more quantum coherence, and result in more 
disturbance of the post-system state. The Uu may be thought of as unitary operations that 
the system undergoes conditional on measurement outcomes b and (if they vary with i) on 
potential measurement outcomes i which are not gathered by the POVM {Fb} but which 
are nevertheless available to the apparatus, so that the further evolution of the state may be 
conditioned on them. If the Um vary with i while Pm does not, then these further "potential 
measurement outcomes" carry no information about the pre-measurement system state, and 
simply represent a stochastic resetting of the state which is not conditioned on any further 
information about the state — a further noisy disturbance of the state. 

A natural generalization of a projective measurement is to have a single value of % in the 

1/2 

above sum, and let Am = F b ' , so that the unnormalized conditional density operator and 
the unconditional post-measurement density operator are given by: 

Pb - b b P b b 
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P ' = T,F b 1/2 pF b 1/2 - (5) 

b 

I will say that such measurement procedures exhibit "the square-root conditional dynamics," 
and call the associated operation the square- root operation for that measurement. Some- 
times I will call this "the square-root measurement procedure," although care should be 



taken not to confuse this with the "pretty-good measurement", which some authors f!4 
call the "square-root measurement". In part because of the polar decomposition of the A's 
just discussed, we may view any measurement of {Fj,} as beginning with the performance of 
the square root conditional dynamics, followed, possibly, by further conditional operations; 
this provides one (rather weak) motivation for thinking of the square root dynamics as the 
"minimal disturbance" one is compelled to cause. (It is a weak motivation because the 
subsequent conditional dynamics can, for some ensembles, be chosen to on average repair 
some of the square root measurement's damage to the initial state.) Even in the case of 
projection-valued measures, the square-root operation is a very special case, in which the 
unitaries Uu are all the identity / (up to an irrelevant phase) and for each b there is only one 
Ab, which in this case will just be the projector corresponding to the measurement outcome. 
None of the freedom to add noise by further conditional unitary operations, or to further 
disturb the state by effectively collecting extra information which is then thrown away, is 
used in a square-root measurement procedure. 



III. DISTURBANCE MEASURES AND THE INFORMATION-DISTURBANCE 

FRONTIER 

In light of the general formulation of quantum measurement and its effect on a system, the 
question arises: is there anything special about the projection postulate, and more generally 
about the Liiders type of measurement? It is sometimes said, in the context of nondegenerate 
Hermitian observables, that it is the "least disturbing" type of measurement, since when 
the measurement is immediately repeated, one gets the same value of the observable with 
certainty. However, this only means that it doesn't disturb its own eigenstates. Other states 
certainly are disturbed, by projection onto the eigenstates of the observable, and it behooves 
us to ask whether this disturbance is in any sense minimal. If so, one would also like to know 
whether F^ 2 is the minimal-disturbance generalization to POVMs. There, it is no longer 
necessarily true that repeating the measurement is guaranteed to give the same result when 
the operation is F^ 2 . (There is no conditional dynamics which can provide this guarantee 
in the case of nonorthogonal F^.) 



I will use the fidelity F(p,a) := (tr J p l l 2 a p 1 / 2 ) 2 |15| , [[U|, |L7]], in specifying a measure 



of disturbance for quantum states. For pure states p = |^)(^|, this is just (tjj\a\ijj)- It is 
unity when p = a, and zero when their supports are orthogonal. It is therefore a reasonable 
measure of how similar two quantum states are. We may define 1 — F(p,A(p)) to be the 
disturbance to the state p by a measurement procedure resulting in the operation A = J2b 

D:=1-F( P ,J2Mp))- (6) 

6 

Given an ensemble of density operators {pa,A i (tt)}a, there are several ways one might con- 
struct a measure of the average disturbance caused by measurement. For example, one 
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might consider one minus the ensemble average fidelity to the input density operator, of the 
post-measurement density operator obtained from each ensemble member by averaging over 
measurement results: 

Dt := 1 - J dp(a)F(p a , A(p a )) . (7) 

More reasonable in the context of measurement might be consider the fidelity of input density 
operators to the output density operator Ab(p a ) conditional on the measurement result b, 
averaged over both the input ensemble and the measurement result: 

D 2 :=l- f dp{a) FiPa, A b (p a )). (8) 
J b 

This is disturbance from the point of view of someone carrying out the measurement, or 
apprised of its result; the previous quantity is from the point of view of an outside observer 
who does not know the result. Since F is not linear, these do not define the same quantity; 
by the concavity of fidelity [fL7H , D 2 > D\. One might also consider the disturbance measures 
obtained by replacing the first argument of the fidelity function, p a in the above formulae, 
by the ensemble average density operator / dp{a)p a . These measures seem much less natural 
(and, again by concavity, each is less than the corresponding one of Di, D 2 ). For the case 
of ensembles of pure input states (p a pure), Di and D 2 coincide. For the rest of this paper, 
I will consider pure input states, and use this disturbance measure. This is also the measure 
used by Fuchs and Peres 



The average disturbance to an initial pure state, where the average is taken over some 
ensemble of pure states specified by a probability measure //(|^)), on Hilbert space, is given 
by 



D := 1 - / dp(mJ2F(A b (\^)^\), IVXVI) 
J b 

= 1- [ d^)J2MAum 2 ■ (9) 
J bi 

The ensemble I will be most concerned with is dp(\ip)) = dQ^, the unitarily invariant measure 
on Hilbert space, normalized to integrate to 1. 

To measure the information gained about an initial ensemble \& ~ dp(\ip)) , I will use the 
mutual information between the prior distribution and the measurement outcome, denoted 
H(^/ : B). Note that \& is a random variable taking Hilbert space vectors as values, and 
distributed according to dp(\i/j)); B is a random variable taking measurement results b as 
values, distributed according iop{b\ = tr F b \ip)(ip\ = (ip\F b \ip), conditional on the initial 
state The information gain is: 

H(B :■$) = H(B)-H(B\V). (10) 

The second term is the average, using the measure dp(\i/j)) over states \ip) , of the conditional 
information 

H(B\ m := - m logp(6| |V) • (11) 

b 
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I will also occasionally consider a different measure of disturbance, involving the entan- 
glement fidelity 



F e (p,A):=J2\^biP\ 2 ■ (12) 

bi 

The entanglement fidelity of a density operator p under an operation A is less than or 
equal to the average pure-state fidelity of any ensemble for p under A [[HJ. I will define 
the entanglement disturbance D e (p, A) to be 1 — F e (p, A). It is an upper bound to the 
pure-state disturbance to any ensemble for the density operator p. Since it is defined for 
an initial density operator rather than an initial ensemble, it is less suitable than (|6]) for 
use in a information-disturbance relation like that described above, since the information 
gain against which disturbance is graphed involves a particular initial ensemble. However, 
it does provide a lower bound to the information-disturbance frontier. (We could of course 



just fix some ensemble, such as the "Scrooge" ensemble for the density operator p JJU 
which is the one about which the minimum information is gained, and plot information gain 
against minimum entanglement disturbance for this particular fidelity measure. One might 
speculate that the entanglement disturbance would provide a reasonably tight bound on the 
disturbance to the Scrooge ensemble.) 

Given a measurement and a known initial state |^o), it is easy to come up with an 
operation, consistent with the measurement, which minimizes the pure-state disturbance 
(^): just set the state back to its initial value no matter what. This may be accomplished by 

1 /2 

letting A bi = X bi \ip Q ) (bi) , where Xu and \bi) are the eigenvalues and eigenvectors of F b ' . (It 
is easily checked that this measurement has average fidelity one, and satisfies the criterion 
(0) for compatibility with the POVM {Fb}.) But this measurement will severely disturb 
other initial states. When we set up our measuring apparatus we may or may not know 
anything about the states we are going to be measuring. A fair way of assessing whether an 
operation corresponding to a set of effects is minimally disturbing, without assuming any 
prior knowledge about the state to be measured, is to minimize the disturbance averaged 
over initial pure states with the unitarily invariant measure. This also makes the problem 
of finding the least disturbing measurement analytically tractable. 

Ultimately, one would like to find the inj 'ormation- disturbance frontier for a given ensem- 
ble, defined as the graph of minimal disturbance for a given amount of information collected 
about the initial state, against information collected. (We could equivalently define it via 
the dual optimization problem, as the graph of maximal information collectable by a mea- 
surement causing no more than a fixed amount of disturbance, against that disturbance.) 
Formally, we must define this graph as the infimum of disturbance for a given amount of 
information collected about the state, and show that this infimum is in fact attainable. 

Short of an explicit expression (which seems unlikely for a general ensemble), one would 
like to derive general properties of this frontier — such as the fact that minimal disturbance 
increases with information collected. This may appear obvious: one could argue that we 
couldn't cause less disturbance by collecting more information, for then one could just collect 
the smaller amount of information by doing an experiment that would collect more informa- 
tion with less disturbance, but adding noise to the readout, or not looking at all details of 
the answer. Fuchs and Peres |H| have explored this frontier for two-state ensembles, with 
possible applications to quantum cryptography. 
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Some progress toward the structure of the information-disturbance frontier may be made 
by noting that both disturbance measures considered above (and indeed also all the distur- 
bance measures which are one minus an average ensemble fidelity), are linear in the opera- 
tion, and the information is linear in the POVM. More precisely, from a set of POVM's {F£} 
(where % indexes which POVM and b indexes which operator in the POVM) and associated 
sets of tracepreserving operations {A 1 } indexed by i with operator decompositions {A\} we 
construct the POVM's and operations which are convex combinations of these: 

{G lb } := {Vift, (13) 
B ~ {VAMj}. (14) 

Then for any of the disturbance measures discussed above (1 — F e (p, A) and 1 — F(E,A), 
regardless of the density operator p or the ensemble E used in the average), we have 

D(B) = J2^D(A l ) . (15) 

i 

Also, for any ensemble of states: 

H{{G lb }) = ^H{{Fl}). (16) 

i 

where the overbar indicates the ensemble average over the information conditional on the 
input state. Hence, given any two points in the information-disturbance feasible set, the line 
joining them is entirely within the set. This implies 

Theorem 1 The information- disturbance frontier D(I) for a pure-state ensemble is convex. 

(Our convention is that a function / is convex if Xf(x) + (1 — X)f(y) > f(Xx + (1 — X)y), 
i.e. the average of the function is greater than or equal to the function of the average.) 

Since the disturbance measures under consideration are positive, and one endpoint of 
D(I) is at the origin, this implies that the information-disturbance frontier for a pure-state 
ensemble is nondecreasing: minimal disturbance is nondecreasing with information obtained. 
That is, 

Proposition 2 For any disturbance measure of the form 1 — F(E,A) or 1 — F e (p,A) the 
minimal disturbance required to obtain a given amount of information about some fixed 
ensemble (which need not even be that used in the disturbance measure) is nondecreasing in 
the amount of information obtained. In fact, it may have a flat section following the zero- 
information endpoint, but at some point must become and remain monotonically increasing. 

We may use this fact to show: 

Proposition 3 For any pure-state ensemble, if a point on the upward-sloping portion of 
the information- disturbance frontier is attainable then it is attainable by a POVM {Fb} 
measured in such a way that the conditional operations {Ab} may be taken to have a one- 
operator decomposition. 
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We will say such a measurement procedure has one-term conditional dynamics. 

Proof: Consider an operation A compatible with a POVM E, and suppose that the 
procedure (S,^4) achieves some point ( on the upward-sloping portion of the information- 
disturbance frontier. Let (T,,A) have multiterm conditional dynamics. Then A is also 
compatible with some POVM T which finegrains E, such that (T,^4) exhibits one-term 
conditional dynamics. T gathers no less information than E. If it gathers the same amount, 
then ( is achievable by (T, ^4) and the proposition is established for the point (. If it 
gathers more information than E, then by the strict monotonicity of this portion of the 
frontier (Proposition A must have disturbance greater than the minimal disturbance for 
E, contradicting the assumption that it was least-disturbing for E, and so establishing the 
proposition. ■ 

So in investigating measurement procedures achieving the information-disturbance fron- 
tier, we may confine our attention to those with a single Ab for each POVM element 

In fact, we can also show that for any feasible information-disturbance combination 
(D, I), there exist ways of achieving (D, I') and (D', I) with one-term conditional dynamics, 
where D' < D and /' > /. The first is done by considering the fine-grained POVM of 
the proof above; the second by mixing this with the trivial POVM, /, measured with one- 
term conditional dynamics. This enables us to confine our attention, when considering the 
form of the information-disturbance frontier, to measurement procedures exhibiting one- 
term conditional dynamics, even without any assumption that the frontier is attainable. 

However, one might also wish to directly show the superiority of the single-term oper- 
ations for arbitrary POVM's, and possibly even for ensembles other than the uniform one. 
That is, one might hope to show 

Possibility 4 For any POVM and any pure state ensemble, the set of operations least- 
disturbing to that ensemble and compatible with that POVM contains an operation with 
one-term conditional dynamics. 

One might even try to show that the least-disturbing operations compatible with a POVM 
all have one-term conditional dynamics. (To show this, the definition of one-term conditional 
dynamics would have to modified so as to include, at least, "trivial" multiterm conditional 
dynamics in which the many Kraus operators Au have, when polar decomposed, the same 
isometric part, and positive parts proportional to each other.) Multiple-term operations 
consistent with the same POVM involve potentially collecting more information, and so it 
seems reasonable that this would cause more disturbance. Conceivably, however, it might 
cause less disturbance if the additional information helped restore the initial state better 
than could be done without it. 

It appears difficult to establish the desired property in general, but we may show it for 
the uniform ensemble. (It is easy to show if our disturbance measure, instead of an ensemble 
average fidelity, is one minus the entanglement fidelity of the uniform density operator; this 
is done in Appendix 

Theorem 5 One-term conditional dynamics always give a minimally- disturbing way of 
measuring a given POVM, on the uniform ensemble. 

Consider the contribution to F from a particular value of h: 
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£ / d^MA b m 2 = j ds^MM^W) ■ ( 17 ) 

i 

The disturbance in the multi-term case separates into terms for each = UuPu i n which i 
indexes the different operators corresponding to the outcome b, Uu is unitary and p» positive 
(the polar decomposition again). From this and the result of || (cf. PTJ) that for X > 0, 
max unit ary v |tr V X\ occurs where VX = V XXX, it follows that |v4ftj|-?/>) | 2 is maximized where 
U bi = I, so A bi = P b \ /2 . 

We therefore proceed by a proposition which will be proved below. 



Proposition 6 For any and positive Pi, p 



2 



(^iPii^) 2 + (mw < wiy/p? + nw ■ (is) 

This implies that A b = \/7& is a minimally disturbing operation to Q for general POVM's, 
since any (finite) purportedly better set of operations can be repeatedly coarse-grained in 



the manner of Equation [18| to arrive at \[F~ h = yJ2bi Pu- This proves Theorem |[ ■ 
In fact, Proposition (Q), implies that for any initial ensemble, not just the uniform one, 
coarse-graining the measurement decreases the disturbance caused by a measuring with 
square-root conditional dynamics. However, this does not yet prove that coarse-graining a 
measurement decreases the minimal disturbance for an arbitrary ensemble, for the minimally 
disturbing operation compatible with a given POVM will generally not be the square-root 
operation unless the ensemble is uniform. 



For our application, we also have y P 2 + Pi < I, but the proposition holds more gen- 
erally. Proposition |6] is not hard to prove when the Pu commute. Let Pi have (positive) 
eigenvalues Aj. Let P 2 have (positive) eigenvalues rji for the same eigenvectors as Pi, so 



that they commute. Then J P( + P^ commutes with them, and has positive eigenvalues 

W 1 Xf + rjf and the same eigenvectors. We will use these eigenvectors as a basis and write the 
inequality in components, with X{ being the z-th component of \ip) in this basis. The desired 
inequality flTgp becomes: 



E^A,) 2 + (5>?7fc) 2 < (Ex>y/)$ + r%f (19) 



<E^V( A ' + ^)( A K^) (2°) 

ij 



< X-'i'7\ A ' A J + vlv] + ^ + XjVi- (2i) 



Rewriting the LHS as 



ij 



A ?A i + ViV] + 2\AiWi- (22) 
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we see that if 



then the LHS is less than the RHS. And this is indeed the case: letting a = XiTjj and b = Xjtji, 
it reduces to the fact that a 2 + b 2 > 2ab (which is true since (a — b) 2 > 0, with equality 
iff a = b). Equality in our expression occurs when Xffj = Xji]i for all i,j, that is, when 
Xi/Xj = rji/rjj. In other words, the POVM elements P h \ are proportional to each other. This 
implies that knowing which of them occured gives us no additional information about the 
state. 

Proof of Proposition In the general case Proposition || follows quickly from the fol- 



lowing theorem of T. Ando [22], which is easily seen to be equivalent to Lieb's concavity 



theorem ( f23]l ; see also discussions in especially p. 273, and p5[|). 
Theorem 7 (Ando) For < t < 1, the map: 

(A, B) -f A* g> B 1 - 1 (23) 
is jointly concave on pairs of positive operators A, B. 

Proof of Proposition 0: Consider the map from operators to the reals given by: 

HA) = (4>\A 1/2 \tP) 2 ■ (24) 

Then (0) is equivalent to the superadditivity of T: T{A) + T{B) = T{A + B) on 
the cone of positive operators (let A = P 2 ,B = P 2 2 )- Since T is linearly homogeneous 
{T{XA) = XJ-'(A),) this is equivalent to the concavity of T on the unit interval. Also, 
T{A) = (ip\(ip\ A 1 / 2 ® A 1 ' 2 |-0) \if>). Ando's theorem has clS db special case the concavity of 
the mapping A — > A 1 ! 2 A 1 / 2 , which implies that any diagonal matrix element of it (in 
any basis) including that between and itself, is a concave function. (Ando's theorem 

holds on the entire cone of positive operators, which is why we did not need the restriction 
P? + Pi < I hi Proposition |.) ■ 

IV. MINIMALLY-DISTURBING OPERATIONS COMPATIBLE WITH A GIVEN 

MEASUREMENT 



With arbitrary POVMs, with the operation for each measurement outcome given by a 

b 



1 /2 

single decomposition operator Af,, we can show that Af, = F b is a minimal-disturbance 



operation and evaluate the minimal disturbance. That is, 

Theorem 8 Let {F^} be a POVM, and let {Ab} be a set of operations compatible with that 
POVM . If each Ab has an operator decomposition consisting of a single operator, then 

£ J ^F(|V>)(#A(|^|)) 

b 

<Y, i d ^nm^F l b /2 \^)mi 12 ) 
b j 

-^c+^^r. (25) 



n 



The proof proceeds via the following Lemma, which also appears with a different proof 



in |6|. 



Lemma 9 Define 

U := I <m,.\r)(r\ ^ \r)(r\ (20) 

Then 

<h<i--- nt 

Proof of Lemma: 



n = m,\ ^ EKM ® 1-7)01 + 1»)01 ® L?M • ( 27 ) 



dn^ j2{im(mm(^\m) m\ ® ioh • (28) 

With the notation (i| , ?/>) = r^e 1 ^, etc., the ijlm-th matrix element of II may be written as: 

J drd(f)8{\r\ - l^e^e"^"^^^ - **"* . (29) 

Here dr = dr\dv2 ■ ■ ■ dr^ d(p = d(pi ■ ■ ■ d<pd. 

The angular integrals give zero except in three cases, for which the matrix elements in 
follows: 

l.i = j,l = m,i?l: J^|(#)|W>| 2 

2. « m. j l.i / j: /d^|(#>| 2 |0lV>| 2 (30) 

3. i = j = I = m : / df^|(i|^)| 4 . 

The integrals are easily done using Eq. (12) of Jones [27], which yields: 

J dn,ma)\\m\ 2 = 1 -^^^ , (si) 

where \a) , \b) are any normalized, but not necessarily orthogonal or identical, vectors. For 
our cases 1 and 2, the matrix elements are l/d(d + 1); case 3 gives 2/d(d+ 1). We combine 
1/2 times the case 3 terms with each of case 1 and 2, enabling us to remove the inequality 
condition on the indices, and change the dummy index I to j to obtain the Lemma. ■ 
Proof of Theorem [|: 
Note that 

J dn^\A\iP)(iP\B\^) = tr (Tl(A <g> B)) , (32) 

Lemma |9] enables one to write this as (l/d(d + 1)) J2ij ((^I-^KKjI-^Ij) + (*l A \j) (j\ B 
Hence the average overlap becomes: 
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d(d + 



^EE ((i\A b \i)(j\Al\j) + (i\ A b \j) (j\ At \i) 



b ij 



d(d + 



1 - T) T,(\^M 2 + trA b At)) . (33) 



By the linearity and cyclicity of the trace and the fact that A is trace-preserving (J2b A b A b = 
I), the second term in parentheses is d. We wish to maximize this overlap (thereby mini- 
mizing disturbance) over all single-term operations compatible with F b . So for each b, we 
maximize the 6-th term over all A b such that A b A b = F b . By the polar decomposition of 
operators, such A b have the form U b F b . From this and the result of [ffl (cf. [|2l]) that for 



A > 0, max unitary v | tr VA\ occurs where V A = y A3 A, it follows that |tr A b \ is maximized 

1/2 

where U = I, so A b = F b ' . Thus the optimum overlap is obtained with the square root 
conditional dynamics. It is given by: 

~ F """ = dil+T)^ + ^* tr ^ /2)2 ' ' < 34 ' 

The corresponding minimal disturbance is 

Dmin 1 F max • (35) 



Consider the special case of effects proportional to one dimensional projectors. The ef- 
fects F b become g b \b)(b\, where g b are proportionality constants satisfying J2 b 9b = d. The 
optimum overlap and disturbance for the uniform ensemble, with one-term conditional dy- 
namics, are given by: 

p J— n - d ~ 1 

fmax- d + 1 , U mln - . { 6t>) 



V. INFORMATION 

We have found, in Eq. (pop, the minimum disturbance for measurement of an arbitrary 
POVM. This is a step towards deriving the information-disturbance frontier. As a special 
case, we found the minimal disturbance to be (d — l)/(d + 1) for a class of measurements 
in which the effects are proportional to one-dimensional projectors. At the opposite pole 
from these "fine-grained measurements" is the ultimate coarse-grained measurement of a 
single effect which is the identity operator. This yields zero information, and can be ac- 
complished with no disturbance. These extreme cases presumably represent the endpoints 
of the information-disturbance frontier. Another step toward deriving the frontier is to 
find the information gained in measurements of the fine-grained type investigated above, 
which is clearly greater than zero, as is the disturbance they cause. This will pin down the 
maximal- information endpoint. It turns out that the information yield is the same for all 
such fine-grained measurements, whether the effects are orthogonal or not. This is a special 
case of the fact that any fine-grained measurement gives the same information about the 
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"Scrooge" ensemble. (The Scrooge ensemble for a given density operator p is defined as the 
ensemble (from among ensembles for p) for which the accessible information is minimal [|20| . 
The uniform ensemble is the Scrooge ensemble for the uniform density operator I/d.) 

Here I present a different derivation of the information gained by a finegrained measure- 



ment, which applies to the the uniform ensemble only and uses the methods of Jones P? 
Recall that the information gain from measurement is the mutual information between the 
prior distribution and the measurement outcome, denoted H(ty : B). I will use this in the 
form: 

H(B-.m) = H(B)-H(B\m). (37) 

This can be calculaated form the prior probability measure on states p(\ip)) which we assume 
to be the unitarily invariant one, and the conditional probabilities p(b\ip) of the data (mea- 
surement outcomes) given the initial state, which are r Tigi ) \b){b\%l)){il)\ = gb\(b\tfj)\ 2 . (Here I 
use the notation for finegrained measurements introduced at the end of Section |V].) The 
first term is 

= -$>(&) logp(6). (38) 

b 



Since 



Pib)= I dO l . P i\r))p(b\r) /^IW>| 2 = §, (39) 



#0B) = -Ef log I = -Wg b logg b + \ogd, (40) 

b b 

where I have used equation (7) of p7| to do the integral, and have also made use of the fact 
that J2b 9b = d. 

The second term is: 

H(B\*) = - f dl^ $>(£#) logp(£#) (41) 
J b 

= - I d^Y,9b\m\ 2 ^gg h \m\ 2 (42) 
J b 

= -Y,9b f ^|W)| 2 (log^ + log|W)| 2 ) (43) 
b J 

b J 

-Y.9b I ^|W)| 2 log 
b J 

(44) 

The first integral is the same one we encountered in H(B), and its value is 1/d. The second 
integral is more complicated, but can be done using the same formula as the first (or see 
12811); its value is 
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Hence 



H(B\*) = -Y l 9blogg b + Y l 



I ^ d-1 



1 



(46) 



b k=l 



l + k' 



Combining equations (|§) and (|40[) , we obtain 



d-1 



1 



F(S : *) = logd 



E 



l + A; 



(47) 



fc=i 



This depends only on d, and not on the weights g^, as long as the Ff, are proportional 
to one- dimensional projectors, the information gained about a maximally uncertain initial 
pure state is the same, whether the measurement is of orthogonal projectors or some other 
set of maximally fine-grained effects. 

Unfortunately, finding the information gain from measuring an arbitrary POVM is a 
much more difficult problem. 



For the information-disturbance frontier, we need the information gain maximized over 
possible measurements and compatible operations causing a given level of disturbance (or 
less). Equivalently, we need the minimal disturbance measurement and associated operation 
which gives a fixed level of information gain. Since the minimal disturbance associated with 
all fine-grained measurements is the same, and they all yield the same information gain, 
we have found the high-information endpoint of the information-disturbance frontier. For 
any other set of effects will be a blurring (by allowing positive operators not proportional 
to projectors) or coarsening (by allowing higher-dimensional projectors) of these effects, 
resulting in less information gain and the possibility of less disturbance. Clearly, the other 
endpoint is at zero information and zero disturbance, achieved by the identity operation of 
doing nothing. One might speculate that the minimally disturbing measurement (for the 
uniform ensemble) for any given level of information obtained, is to measure a fine-grained 
set of effects with some probability, and otherwise to do nothing. That is, our POVM is 
given by the set {al, (1 — a)Ft>}, where the Fb form a fine-grained POVM. Then the tradeoff 
frontier is a straight line between the known endpoints. However, it seems unlikely that the 
frontier is perfectly straight. This would just be too boring to be true. In the next section, 
we will make some progress towards obtaining a closed form for the information-disturbance 
frontier, by showing that for each point on the frontier, there exists an optimal measurement 
procedure associated with a very simple operation, that of swapping in the maximally mixed 
state with some probability and otherwise leaving the state undisturbed. (This operation is 
not compatible with the measurement just discussed, that gives the straight-line frontier.) 



VI. THE INFORMATION-DISTURBANCE FRONTIER 
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VII. TOWARDS THE FULL FRONTIER 



We will say an operation A is unitarily covariant if 

W ] A(WpW ] )W = A(p) (48) 

for any unitary W. We will also introduce a convention for ensembles or sets denoted by 
expressions within curly brackets. The convention is that when we put part of the expression 
within the brackets as a subscript of the right-hand bracket, the overall expression refers 
to the ensemble given by the expression within brackets, when only the subscripted piece 
varies. Thus for example {pij} refers to the ensemble of the pij for various j and fixed i. 
(This is, therefore, the i-th in a list of ensembles indexed by i.) 

Using the unitary invariance of the ensemble Q, we will show that 

Theorem 10 There is always a unitarily covariant way of obtaining a given I with minimal 
disturbance to Q. 

In other words, for this measurement and conditional dynamics the operation A := J2b^b 
is unitarily covariant. 

Proof: By the unitary invariance of the ensemble Q, for any fixed unitary U, the POVM 
{UFbU^} has the same information and the same minimal disturbance (for Q) as {Ft,}. 
(This is so because the information depends only on the probabilities pb = (tp\U FbW , 
so transforming the POVM is equivalent to transforming the ensemble, which we know is 
invariant. Similarly, the minimally disturbing conditional dynamics compatible with this 
POVM are given by the operation with decomposition Ab = (UFbW) 1 ^ 2 = UF^ IT* . The 
average disturbance depends on the Ab only through (ip\Ab\tp) = (ip\UF b 1 ^ 2 U' f \tp), so again 
we may view the unitary transformation as applied to the ensemble, which is invariant 
under it.) By the linearity of disturbance and information in the POVM and operation, 
respectively, the continuously indexed POVM 

{dp(U)UF b U^} b ,u (49) 

where POVM elements are indexed by both b and U, achieves the same information and 
disturbance as {Eb}. Here dp(U) is the (unitarily invariant) Haar measure on the unitary 
group U(d). This POVM is unitarily invariant in the sense that applying any unitary V to 
all elements of the POVM just results in the same POVM with the elements reindexed. The 
optimal associated operation is given by the continuous decomposition: 

{dp(U)^ 2 UF b 1/2 U%,u . (50) 

The "square root of a measure" here is just formal notation. (For a rigorous treatment of 
such operations as "Radon- Nikodym derivatives of quantum instruments", see ]29|], @.) 
This operation is defined by its action: 

AP) = E / dp(U)UF b l/2 U^ P UF b l/2 U^ (51) 
b J 

note that the formal square root does not appear here. The unitary covariance of A is 

straightforward from (|5l| ) and the unitary invariance of dp. ■ 
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A unitarily covariant (which we will also call isotropic) operation may be viewed as 



mixing in the uniform density operator with some probability p (cf. e.g. |31|, [32], [p3j] ): 

A p (p) = (l-p)p + p(I/d) . (52) 
This operation causes disturbance 

D min {£l,Ap) = P—ij— ■ (53) 

To calculate the information-disturbance frontier, we now need only to calculate the 
maximum, over POVMs compatible with the isotropic operation A p , of the information 
gathered by the POVM . Here I will not give a closed form for the maximum, but I will give an 
approach which reduces the problem from a constrained maximization to an unconstrained 
one. To do this, we recall some more of the basic theory of quantum operations. By saying 
a POVM is "compatible with the operation" A, we mean that it can be measured by an 
instrument which gives rise to that operation (when measurement results are averaged over). 
Any POVM compatible with operation A is given by coarsegraining some set of operators 
{F b } defined by F b = A ] b A b for some decomposition {A b } of the operation A. We need 
only consider the POVM's obtained as {F b := A b A b }, and not the coarsegrainings, since the 
coarsegrainings obtain less (or at least no more) information. Thus every decomposition of 
an operation determines a compatible POVM, and all compatible POVM's are obtained by 
this procedure (plus coarsegraining). 

Any two decompositions of the same operation, {A{\ having r operators and {B{\ having 
s operators, are related by @: 



A i = Y,m ij B j (54) 
i=i 

where m is the matrix of a maximal partial isometry from the complex vector space C s to C . 
A partial isometry is a generalization of a unitary operator, which must satisfy VV< = U 
for some projector II. Such an isometry will then also satisfy V*V = T for some projector 
T having the same rank as II. If the range and domain spaces of a linear operator V have 
different dimensions, it will not be possible to find a unitary mapping between the two: the 
best one can do is find a partial isometry V such that one of VV^ and V'V is the identity 
(whichever one operates on the smaller space). We will call such a map a maximal partial 
isometry between the spaces Si and S2. A partial isometry with VV< (and hence V) having 
rank C may be thought of as projecting onto a C-dimensional subspace of V's domain 
Hilbert space and then mapping that subspace unitarily to a C-dimensional subspace of the 
range Hilbert space. Thus if s < r in ([54]), m's columns are s orthonormal vectors in C r : 

m^mtj = 5 ik (55) 

3 

or in other words: 



mm 



f = I {s) . (56) 



Any quantum operation on a system Q may be realized |34]] , JTTJ], || by a "unitary 



representation" in which the Hilbert space Q is extended by adjoining an environment E 
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prepared in a standard state \0 E ), and the system and environment undergo a unitary inter- 
action, followed by a projection on the environment system. Any such unitary interaction 
with a given initial environment state determines a quantum operation. (In the case of a 
trace-preserving operation, the environment projection is the identity.) That is, 

A(p) = tr E (7r E U QE \0 E ) (0 E \ <g> p Q U ]QE K E ) . (57) 

The operators A i in the operator decomposition representation discussed above, turn out to 
be the "operator matrix elements" 

A? = (i E \U QE \0 E ) (58) 

of the unitary interaction, between the initial environment state and orthonormal environ- 
ment vectors \i) of the basis used for the partial trace over the environment. The freedom 
(j54|) to "unitarily mix" the operators Ai, obtaining another valid decomposition, is just the 
freedom to do the enviroment partial trace in a different environment basis (related to the 
first by the transpose of the unitary used in remixing). See |19j for a more extended discus- 
sion of this. Here, we merely emphasize that in order to get all decompositions as we vary 
the measurement on the environment, it was assumed that the environment was initially in 
a pure state. 

The import of this for our problem of extracting information about via measurements 
compatible with A p is that we may vary over the relevant "finegrained" POVMs compatible 
with A p by imagining we implement A p with an initially pure environment, and varying 
over all measurements on the environment. We may do this by letting the interaction U® E 
swap half of bipartite a maximally entangled state from the environment into the system Q, 
conditional on "quantum dice" loaded with probability p. [] Since half (i.e., one subsystem) 
of a bipartite maximally entangled state has the uniform density operator I/d, this just 
replaces the state of Q with the uniform density operator, with probability p. In other words, 
it effects the isotropic operation with parameter p. In more detail, we let the environment 
be the (d 2 + l)-dimensional Hilbert space 

E = Ei ® E 2 © F , (59) 

where E\ = E% = Q are d- dimensional and F is a one-dimensional "flag" on which the 
swapping is conditioned. We prepare an initial environment state 



Quantum dice" are usually taken to consist of a pure entangled state of two systems, used as 
dice by conditioning operations on some third system on the eigenbasis of one of the two entangled 
systems. The resulting operation on the third system has the effect of randomly performing one 
of the operations which were performed conditionally, with probabilities given by the eigenvalues 
of the reduced density matrix of the entangled state. Below we use a slightly different formulation 
which applies to our special case of either doing or not doing some operation. This involves an extra 
"flag" dimension of the environment instead of an extra environment qubit. It reduces the required 
number of Hilbert space dimensions, because we don't have to have the maximally entangled state 
ready for partial swapping even in that subspace where the swapping won't be done, as we would 
if we conditioned on a qubit value. 
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E ) = Jl-p\F) + ^pYl 



i Vd 



; E 2 



(60) 



and realize the operation A v on Q through the unitary interaction: 

U QE := (SWAP(E U Q) <g> I E2 ) © (I F ® I Q ) . 
SWAP simply swaps the states of E\ and Q; it is defined by: 



SWAP{E h Q) 



so that, overall 



JjQE 



JjQE 



J 



J 



(61) 

(62) 

(63) 
(64) 



When goes in on the measured system, the final environment state is 

jE 2 



p E \\ij),p) = (l-p)\F)(F\+p\^)(^ 



d 



+ \ 



d 



\F) (*p El \ (tfj E2 \ + \tp El ) \tp E2 ) (F\ 



(65) 



Now, any information about the initial state of Q obtainable by a measurement compat- 
ible with A p may be obtained by measuring the environment E after the above-defined 
interaction U® E , for each such measurement made on the environment after the interaction 
corresponds, via the unitary representation of operations, to a decompositions {^4&} of the 
operation A p , and thus to a POVM on Q compatible with A p , and as we vary over all 
measurements on an initially pure E we obtain fine-grainings of all such POVM's. 

The uniform distribution Q for initial states ip gives rise, via the dynamical evolution 
U® E , to a distribution \i p on final environment states p E . The accessible information about 
p E is the maximal information obtainable about by measurements on E' consistent with 
this operation, and hence gives us the maximal information about the initial preparation 
consistent with the isotropic operation A p . As we vary p parametrically, we get the 
information-disturbance frontier for the uniform pure-state ensemble Q. 



VIII. SPHERICAL 2-DESIGNS 

All the results of this paper (notably, Theorems |5] and which involve only average 
pure-state fidelities over the uniform ensemble (and not, for instance, information), hold 
also for a class of discrete pure-state ensembles. These ensembles are the spherical t-designs 
for t > 2 in d — 1-dimensional complex projective space CP^-i (that is, the space of rays 
of the d-dimensional Hilbert space, isomorphic to the space of pure quantum states \ip){ip\). 
Various equivalent definitions of these designs exist, but the one relevant here is that a 
spherical t-design is a finite set A C CP^-i such that the uniform integral over CPd-i of a 
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polynomial P of degree no higher than t is equal to the discrete average of the polynomial 
evaluated on the points of the design: 



= t^t E P{*) ■ (66) 

TreA 



CPd-i |A| 



(As usual, l^l denotes the cardinality of a set S.) A reasonably good supply of small (size 
quadratic in the dimension) spherical 2-designs exists, and some are given by the following 
construction. Define two orthonormal bases to be unbiased [[35] or conjugate |36| if any inner 
product of a vector from one basis with one from the other has modulus 1 / yd. There exist 
sets of "complementary" bases which are higher-dimensional analogues of the eigenbases of 
a x , a y , and a z . These are the "mutually unbiased bases" (MUBs) introduced by Ivanovic 
|37j (for prime dimension), and by Wootters and Fields |35| (for prime power dimension). 
Let the index k = 0, ...N — 1 specify which basis; i = 1, ...,d specifies which vector in the 



basis. A set of iV orthonormal bases indexed by k is said to be mutually unbiased |35j or 
conjugate p6| if for all k ^ I 



\(^)\ = 1/Vd. (67) 

For d = p n , p prime, Wootters and Fields constructed d + 1 mutually unbiased bases 
e*^ . The construction uses the finite field F p n of prime power order, also known as Galois 
fields GF(p n ), which has p n elements (including zero). For odd primes, the construction is 
as follows. One basis may be chosen arbitrarily; in this "standard" basis the Z-th component 
of the j-th vector of the fc-th basis is: 

(M) = -^ Tr [ fc,a +^ (68) 
Vd 

where /, k,j range over the p n elements of F p n. 

u := e 2m/p , (69) 

(a primitive p-th root of unity) and 

Ti[x]:=x + x p + x p2 + --- + x pn ~ 1 . (70) 

Note that the trace has values in a subfield of F p n isomorphic to F p . Verifying that these 
are mutually unbiased is a relatively calculation using elementary properties of the trace on 
finite fields pSfl ) and Gauss sums PJ. In particular, the properties Tr {x + y) = 
Tr (x) + Tr (y),(x,y G F pn ) and Tr {ex) = cTr (x),c £ F p ,x £ F pn are fundamental. 
Wootters and Fields also give a construction for p = 2, but it is more complicated and 
I will not present it here. Working independently of Wootters and Fields and of Ivanovic, 
and using ideas from coding theory and finite geometry, Calderbank, Cameron, Kantor, and 



Seidel [40] also found sets of d(d + 1) mutually unbiased bases for prime-power dimension, 
which may well be the same as Wootters' and Fields'. (At least some cases were also found by 
other authors cited in ||40|| .) Calderbank et. al. also state that many unitarily inequivalent 
such sets of MUBs must exist. (Constructions are known at least for d a power of 2.) d+1 
meets an upper bound (valid for arbitrary d) on the number of such bases, established by 
Delsarte, Goethals, and Seidel [0. I know of no examples meeting the bound for d with 
distinct prime factors. 
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Theorem 11 The set ofd(d+l) vectors ef^ belonging to the union of the (d+l) mutually 
unbiased (aka conjugate) bases constructed by Wootters and Fields for d a power of an odd 
prime, is a spherical 2-design in CP^-i- 

Proof: 

Any second-degree polynomial in ir = \%p)(%p\ may be written J2 a ^ T irA a tr irB a for some 
finite set of linear operators A a and B a . (This is shown e.g. in or using Lemma 1 from 
So by (|32|) we need only show that: 



T :-- 



d(d 



E 

ki 



n . 



(71) 



First consider the operator made 
the standard basis: A := J2kieF p n 
elements 



w summin 


g over 












4) (4 



In the standard basis this has matrix 



(a|( 7 |A|/3)|5) 
E(«l^)(ef|/5)( 7 |e, fc )(e^) 



ki 



(i/^ 2 )E^ 

ki 

(Vrf 2 )E^ 

k 



Tr (fco 2 +ia)-Tr (fc/3 2 +i/3)+Tr (fc7 2 +«7)-Tr (ka 2 +iS) 



LV fc(a 2 -/3 2 +7 2 -<5 2 ) ^\,, Tr i(<*-/3+7-5) 



(72) 



We thus have a product of two sums of the form J2keF pn ujkx - This sum is easily shown to be 
equal to p n 5 x>0 . (By definition 5 X $ — if x = 0, 1 otherwise.) To show it, note that as 
/3 ranges over F p n, Tr /3 takes each value in F p equally often (i.e., p n ~ x times). As we vary 
over k, kx for x ^ varies over F p n since f(k): k i— > fca; is a bijection. So we can group 
the sum into a sum of p n ~ x copies of J2 V €F P ljjr,x — P$x,o, obtaining overall p n b x ^. Thus ( ff2|) 
becomes 5 Q 2_ /3 2 +7 2_ (5 2 i0 5 Q _ /3+7 _ < 5 . So we have the simultaneous equations: 



a" 



/? 2 + 7 2 -^ 2 



, a 



(3 + 1 -S = 



111 X' n n . Rewriting these as 



(a + (3){a-(3) = { 1 + b){ 1 -b) 
(a - /?) = ( 7 - <5) 



(73) 



(74) 
(75) 



we see that any a,{3,j,5 satisfying a = /3, 7 = b are solutions, and if one of the latter 
conditions holds they both do. If a ^ (3 (and so also 7 7^ b), we can (since our arithmetic 
is in a field) divide the first equation by the second to get the two equations a + (3 = 7 + b, 
a — (3 = 7 — b, which are simultaneously satisfied whenever a — 7, /3 — 5. So if we write 



A = ]TA a7/3(5 |a> | 7 > (8\ 



(76) 



the matrix elements are 



(77) 
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except that each of the two terms in (|77|) gives a unit contribution when a = 7 = (3 = 5, 
while the matrix element h. ai p$ is still unity. However, the full sum in (fn]); including the 
standard basis, just adds an extra copy of precisely this case, so that (up to normalization) 
([FT]) are the matrix elements of T in the standard basis. These matrix elements are precisely 
those which define II. ■ 
I believe that the mutually unbiased bases defined by Wootters and Fields for d — 2" also 
form spherical designs, but have not shown it. Indeed, it may be that any set of mutually 
unbiased bases necessarily forms a spherical 2-design. The converse is true: for a set of 
d(d + 1) vectors in C d to generate a spherical 2-design in CPd-i, it is necessary that they 



be a set of d + 1 MUBs. (This follows from Theorem 44.9 in [H4 



These designs have an interesting relation to quantum error-correcting codes, and 
are also relevant in cryptography, where they serve to provide a finite ensemble with 
average-disturbance properties similar to those for the uniform ensemble. The information- 
disturbance tradeoff is central to the power of quantum cryptography. The existence of such 
finite ensembles may serve in some cases to allow specification of key or proto-key material 
with a finite amount of information, while retaining the strong average-disturbance proper- 
ties of states completely unknown to one without the key information. For example, these 
bases may serve to define the obvious d(d+ l)-state generalization of the 6-state protocol ( 
m, H) on qubits. 



IX. CONCLUSION 

We have defined and investigated properties of the information-disturbance frontier for 
quantum measurements on an ensemble of states on a finite dimensional Hilbert space, as 
a particular way of formalizing the intuitive notion that quantum mechanics often enforces 
a tradeoff between gaining information and causing disturbance. General properties of the 
frontier, such as its convexity and monotonicity were established. 

Specializing to important case of the uniform ensemble, representing a complete lack 
of knowledge about the initial state, we established further results concerning information 
and disturbance. For any measurement on this ensemble, we showed that a least-disturbing 
way of doing it causes the system to suffer a dynamics, conditional on each measurement 
result, described by a single Hellwig-Kraus operator. We also established that if we restrict 
ourselves to operations for which all Hellwig-Kraus operators are positive (so that they 
represent the square-root conditional dynamics for some measurement), a least-disturbing 
operation compatible with a given measurement, for any ensemble, is to do the square- 
root dynamics for that measurement: fine- graining the measurement can never reduce the 
disturbance. However, we did not establish this for general conditional dynamics, leaving as 
an interesting open question whether there are non-uniform ensembles for which the least 
disturbing way of doing a particular measurement is for the apparatus to collect additional 
information beyond the measurement outcomes, and use it to aid in attempting to restore the 
initial state. Our main result establishes that this is not so for the uniform ensemble, since an 
optimal instrument for measuring it just implements the square-root conditional dynamics. 
This allows us to calculate, for any measurement, the minimal disturbance compatible with 
a measuring it on the uniform ensemble. This is only part of what is necessary to find 
the information-disturbance frontier, which involves an in general difficult maximization of 
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accessible information subject to a disturbance constraint. We showed that the maximal 
information on the uniform ensemble may be obtained by unitarily covariant measurements 
and conditional dynamics. Thus, the overall action of the measurement dynamics on the 
state is just that of a "generalized depolarizing channel" family of operations depending on 
a single parameter p which either do nothing to the state, or replace it with the maximally 
mixed state with probability p. It remains only to find the optimal measurement compatible 
with the generalized depolarizing channel as a function of that parameter p. Thus the the 
problem of determining the information disturbance frontier for the uniform ensemble is 
reduced from solving a parametric family of constrained maximization problems to solving 
a simpler parametric family of unconstrained ones. 
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APPENDIX A: SINGLE-TERM CONDITIONAL OPERATIONS MINIMIZE 
UNIFORM ENTANGLEMENT DISTURBANCE 

Theorem 12 Let {F b } be a POVM and T h ~ {F b l/2 }, and A = E& A ~ {Ah}, with 
J2i AfoAbi = F b , be trace-preserving operations. Then 

F e (I/d,A)<F e (I/d,F b ). (Al) 

Proof: We decompose A b into the composition of two operations: a trace-decreasing 
operation Qb defined by: 

MP) = F b 1/2 P F b 1/2 (A2) 
and an operation B b (which is trace-preserving on the support of Fb) defined by: 

B b (p)=J2B b . lP Bt l , (A3) 

i 

where B bi = A^F^ 1 ^ 2 . (F bi may not be invertible; in this case, F b ~ 1 ^ 2 refers to the square root 
of the generalized inverse of F bi . The generalized inverse is the inverse on F^s support (where 
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it is invertible) extended (as a direct sum) by the zero operator on the orthocomplement of 
the support.) It is easily seen that Bf, is trace-preserving on F&'s support (Y^iBjBi = lib, 
where is the projector onto the support of F b ), and that Bb o JF 6 = Ab- 
Then 

F e (I/d,A) = j^EMw^T- (A4) 

By the Schwarz inequality, 

\tvB bl F l b /2 \ 2 = \trB bl F b 1/4 F b 1/4 \ 2 < (trB bt F b 1/2 Bl)(trF b 1/2 ) , (A5) 
so by the trace-preserving property for each B b , 

F e (I/d,A) < ^J2\trF b 1/2 \ 2 = F e (I/d,g). (A6) 

This is just the entanglement fidelity for the uniform density operator when the operation 
Q corresponding to the generalized Liiders' rule is used. Hence the generalized projection 
postulate minimizes disturbance to the entanglement of the uniform density operator. 



APPENDIX B: MORE ON ONE-TERM VERSUS MULTI-TERM CONDITIONAL 

OPERATIONS 

Here I consider some other approaches towards proving Theorem These have so far 
proven unsuccessful except in the case in which all POVM elements commute. They are 
still of some interest in that they attempt to establish intermediate results stronger than 
Proposition [| 

In the one-term conditional dynamics case, we had: 

KV|A 6 |^}i 2 <|(VW)i 2 . (Bl) 
In the multiple-term conditional dynamics case, we might hope to establish that 

i 

If Abi is assumed positive (as it is for the conditional dynamics which are minimally- 
disturbing to the uniform ensemble), this follows from Proposition || but we might try 
to establish [B2] without that assumption. (There is no hope of estabilishing that XJu = I, 



i.e. A^ positive, is minimally disturbing for an arbitary ensemble; it is obviously not true, 
for example, when the ensemble has all probability concentrated on one state |"0)-) 
Defining Bb and Bi as in Appendix |A], 

mA b m 2 = mBuF l b /2 m 2 

= MBb l F l b /i F l b /i m 2 . (B3) 
Applying the Schwarz inequality as before gives 
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< ^\B bl Fl /2 Bl\^)^\Fl l2 \^) . (B4) 

If the inner product were a trace, as before, we would just cycle B^ bi next to Bu and then 
sum on i to get the identity, removing the B's entirely and establishing equation (|B2|) . 
Unfortunately, we cannot do that here unless Bu commutes with F b . Nor is it clear we can 
cycle one of the F^ 4 in (p3|) around to give \{tp\Fl^ BuFl^\ip)\ 2 , which would have given 



rise to the desired ordering after the Schwarz inequality was applied. To proceed from |B4 
means we are trying to show that: 

Y.^\BuFl ,2 Bl\^) < ^\F l J 2 \^) , (B5) 

i 

using the fact that J2i B^Bu = I. However, counterexamples to 

\e(G)m<\(1>\G\1>)\ (B6) 



for trace-preserving £ and < G < I are easily found. For example, let G be proportional 
to a projector onto some state other than and let £ unitarily rotate that state back to 
Since Bb may be an arbitrary trace-preserving operation, this means that the Schwarz 
inequality as applied to obtain equation flB4] ) is too loose for our purposes, and we must 
work with (|B"3|), summed over i. 

In fact, the counterexample to (|B6|) given above also shows that even this will not work: 
there is no hope of establishing equation ([B"2"|), because it is equivalent to: 



(VI&FCIV'KV'IM < (VWXV'iM • (B7) 

Rather, we might try to show that 

(VIRWXVIM < max MUT(\^)(^\)U^) , (B8) 

unitary U 

where B is arbitrary and trace-preserving and JF~{F},/>F>0. 
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